-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize memorynew
intrinsic for constant length Memory
#55913
base: master
Are you sure you want to change the base?
optimize memorynew
intrinsic for constant length Memory
#55913
Conversation
@gbaraldi so with LLVM assertions enabled I'm getting
which is on the line that does |
I'd print everyone involved here with the way I showed you yesterday |
This now works! For simple examples like |
As an example of what is possible. Allocopt was able to go from define i64 @julia_f_769() #0 !dbg !5 {
top:
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
%memoryref_mem = call dereferenceable(40) ptr addrspace(10) @julia.gc_alloc_obj(ptr nonnull %current_task1, i64 40, ptr addrspace(10) addrspacecast (ptr @"+Core.GenericMemory#771.jit" to ptr addrspace(10))), !dbg !14
%0 = addrspacecast ptr addrspace(10) %memoryref_mem to ptr addrspace(11), !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %0, i64 0, i32 1, !dbg !14
%2 = call nonnull ptr @julia.pointer_from_objref(ptr addrspace(11) %0) #4, !dbg !14
%3 = getelementptr inbounds i8, ptr %2, i64 16, !dbg !14
store ptr %3, ptr addrspace(11) %1, align 8, !dbg !14
store i64 3, ptr addrspace(11) %0, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) %memoryref_mem, ptr %3), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} to. Removing the allocation. Which likely would allow it to just return the 11 define i64 @julia_f_769() #0 !dbg !5 {
top:
%memoryref_mem = alloca [40 x i8], align 16
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
call void @llvm.lifetime.start.p0(i64 40, ptr %memoryref_mem)
%0 = freeze [40 x i8] undef, !dbg !14
store [40 x i8] %0, ptr %memoryref_mem, align 1, !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr %memoryref_mem, i64 0, i32 1, !dbg !14
%2 = getelementptr inbounds i8, ptr %memoryref_mem, i64 16, !dbg !14
store ptr %2, ptr %1, align 8, !dbg !14
store i64 3, ptr %memoryref_mem, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) null, ptr %2), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} |
|
|
6222082
to
b65a483
Compare
b65a483
to
724b8c5
Compare
Can you please add an llvm pass test for #56030 (comment) (removing all memory for a simple case where the Memory object doesn't escape)? |
Do you want an actual LLVM pass, or can I just write a test for 0 allocations? |
I think an llvm test would be more robust, but probably a simple zero-allocation test would do the job as well. |
LOL. This test is so good it broke a doctest in performance tips. We're testing to show that you get allocations if you have "bad" code that allocates arrays, but now it doesn't allocate :laughing |
2c2b098
to
e6e26ab
Compare
This is now on top of #55995 (to figure out why we weren't optimizing correctly), but other than that, I think this is good to go! |
Maybe a test of no allocations in simple cases as discussed above? 🙂 |
Again, my point was never that a PkgEval run would be useless. As I tried to make clear multiple times, I was just expressing concern that it wouldn't have caught nsjako's example (though with Tim's PR it hopefully would now). The initial run seemed a bit redundant since the reported miscompilation hadn't been fixed yet, but it's not my infrastructure so I don't really care if people overuse it |
Oh, this is very unfortunate. This PR as is (without any of the alloc-opt optimizations) is able to expose badness in our current LLVM semantics. @gbaraldi this is unexpected, right? |
This isn't necessarily unexpected. The pointer_from_objref we do there is already quite new. Most of the alloc-opt changes are teaching it how to handle code it didn't have to handle before. Not new optimizations per say. |
that's unfortunate. I guess we actually have to stop lying to LLVM then 😢 |
dfef35a
to
3e780d3
Compare
@nsajko thanks for the example you gave! Turns out it simplifies to
(which will get added to the tests) |
tuple test fixed (we had invalid TBAA on |
@nanosoldier |
@maleadt @KristofferC any idea why my nanosoldier seems to be stalled? |
It's not. There's other jobs running.
The run also served to test the new |
The package evaluation job you requested has completed - possible new issues were detected. |
@nanosoldier |
429910d
to
e003251
Compare
the NearestNeighbors issue isn't reproducing locally which is a little scary. |
The package evaluation job you requested has completed - possible new issues were detected. |
so I've investigated 3 of these so far, and 2 were package bugs, and one doesn't reproduce locally. we definitely are getting close |
@nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
I don't know if squashing the whole commit history here was great. It's hard to see how different bug fixes were added and if those have correspond tests now etc. |
@@ -3295,3 +3295,44 @@ end | |||
ref = memoryref(mem, 2) | |||
@test parent(ref) === mem | |||
end | |||
|
|||
# some tests marked broken because CI runs with check-bounds=yes which impedes escape analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it in https://github.com/JuliaLang/julia/blob/master/test/boundscheck_exec.jl or start a new process with the required flags?
A concerning number of these are failing from within type inference which is highly unfortunate. I think we're likely constant folding code that we're miscompiling, but it's hard to know for sure. |
There are also these precompile failures in some of the failed packages:
I wonder if there is some corruption that happens that causes the precompile process to not work properly. |
Co-authored-by: Jameson Nash <[email protected]> Co-authored-by: Jeff Bezanson <[email protected]> Co-authored-by: Gabriel Baraldi <[email protected]>
e003251
to
bdb29cf
Compare
memorynew
intrinsicmemorynew
intrinsic for constant length Memory
This speeds up making new
Memory
s and allow the compiler to better understand what's going on, allowing for LLVM level escape analysis in some cases. There is more room to grow this (currently this only optimizes for fairly smallMemory
since bigger ones would require writing some more LLVM code, and we probably want a size limit on puttingMemory
on the stack to avoid stackoverflow. For larger ones, we could potentially inline thefree
so theMemory
doesn't have to be swept by the GC, etc.Benchmarks: